Skip to content

Conversation

@charles-zablit
Copy link
Contributor

@charles-zablit charles-zablit commented Nov 18, 2025

This patch improves the way lldb checks if the terminal it's opened in (if any) supports Unicode or not.

On POSIX systems, we check if LANG contains UTF-8.

On Windows, we always return true since we use the WriteToConsoleW api.

@charles-zablit charles-zablit requested review from adrian-prantl and removed request for JDevlieghere November 18, 2025 19:45
@llvmbot llvmbot added the lldb label Nov 18, 2025
@charles-zablit charles-zablit self-assigned this Nov 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 18, 2025

@llvm/pr-subscribers-lldb

Author: Charles Zablit (charles-zablit)

Changes

This patch improves the way lldb checks if the terminal it's opened in (if any) supports Unicode or not.

On POSIX systems, we check if LANG contains UTF-8.

On Windows, we check if we are in the Windows Terminal app.


Full diff: https://github.com/llvm/llvm-project/pull/168603.diff

2 Files Affected:

  • (modified) lldb/include/lldb/Utility/DiagnosticsRendering.h (+11)
  • (modified) lldb/source/Utility/DiagnosticsRendering.cpp (+16-1)
diff --git a/lldb/include/lldb/Utility/DiagnosticsRendering.h b/lldb/include/lldb/Utility/DiagnosticsRendering.h
index dd33d671c24a5..3dfa9b0880f70 100644
--- a/lldb/include/lldb/Utility/DiagnosticsRendering.h
+++ b/lldb/include/lldb/Utility/DiagnosticsRendering.h
@@ -64,6 +64,17 @@ void RenderDiagnosticDetails(Stream &stream,
                              bool show_inline,
                              llvm::ArrayRef<DiagnosticDetail> details);
 
+/// Returns whether or not the current terminal supports Unicode rendering.
+///
+/// The value is cached after the first computation.
+///
+/// On POSIX systems, we check if the LANG environment variable contains the
+/// substring "UTF-8";
+///
+/// On Windows, we check that we are running from the Windows Terminal
+/// application.
+bool TerminalSupportsUnicode();
+
 class DiagnosticError
     : public llvm::ErrorInfo<DiagnosticError, CloneableECError> {
 public:
diff --git a/lldb/source/Utility/DiagnosticsRendering.cpp b/lldb/source/Utility/DiagnosticsRendering.cpp
index 8c21e661ce764..6ee0e1b3e04f5 100644
--- a/lldb/source/Utility/DiagnosticsRendering.cpp
+++ b/lldb/source/Utility/DiagnosticsRendering.cpp
@@ -102,7 +102,7 @@ void RenderDiagnosticDetails(Stream &stream,
   // characters.  In the future it might make sense to move this into
   // Host so it can be customized for a specific platform.
   llvm::StringRef cursor, underline, vbar, joint, hbar, spacer;
-  if (stream.AsRawOstream().colors_enabled()) {
+  if (TerminalSupportsUnicode()) {
     cursor = "˄";
     underline = "˜";
     vbar = "│";
@@ -232,4 +232,19 @@ void RenderDiagnosticDetails(Stream &stream,
     }
 }
 
+bool TerminalSupportsUnicode() {
+  static std::optional<bool> result;
+  if (result)
+    return result.value();
+#ifndef _WIN32
+  if (const char *lang_var = std::getenv("LANG"))
+    result = std::string(lang_var).find("UTF-8");
+  else
+    result = false;
+#else
+  result = std::getenv("WT_SESSION") != nullptr;
+#endif
+  return result.value();
+}
+
 } // namespace lldb_private

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 33171 tests passed
  • 491 tests skipped

// Since there is no other way to find this out, use the color
// attribute as a proxy for whether the terminal supports Unicode
// characters. In the future it might make sense to move this into
// Host so it can be customized for a specific platform.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to move this into Host now? I'm generally not a fan of ifdef code in Utility if it can be helped, but I understand if it would be a large amount of work otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make sense to move this to Host, done in the latest commit 👍

@charles-zablit charles-zablit force-pushed the charles-zablit/lldb/better-unicode-check branch from 70a77fc to a0aef7e Compare November 18, 2025 21:41
Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a citation for the WT_SESSION variable.

And it is a bit worrying that one of the top results for that name is one of the Windows Terminal devs (I'm pretty sure they are) telling a project not to use it Textualize/rich#140.

There's a small handful of issues with detecting WT_SESSION and using that as an indicator that you can use advanced VT:

Though in our case, we're not looking for those features specifically so it's probably ok.

I also found microsoft/terminal#13006. Not sure if that applies here, even if it does, I don't know of an alternative way to detect the terminal.

There is microsoft/terminal#1040 but it's still open.

We can certainly say that if we find WT_SESSION, it's definitely Windows terminal. If we don't find it, maybe it is, but we'll be cautious and not emit Unicode anyway.

@charles-zablit
Copy link
Contributor Author

I'd like a citation for the WT_SESSION variable.

And it is a bit worrying that one of the top results for that name is one of the Windows Terminal devs (I'm pretty sure they are) telling a project not to use it Textualize/rich#140.

I used what is-unicode-supported uses, which is a very (50M weekly downloads/week) popular NPM package: https://github.com/sindresorhus/is-unicode-supported/blob/e0373335038856c63034c8eef6ac43ee3827a601/index.js#L11

After some additional testing, both Windows Terminal and Windows Console Host display unicode correctly in lldb. This is on Windows 11, not sure how this works on 10 and earlier. We use WriteConsoleW since #156469, so maybe the Windows code path should always return true instead of checking for the WT_SESSION env var?

@charles-zablit charles-zablit force-pushed the charles-zablit/lldb/better-unicode-check branch from a0aef7e to 41676ba Compare November 19, 2025 12:35
@DavidSpickett
Copy link
Collaborator

I used what is-unicode-supported uses, which is a very (50M weekly downloads/week) popular NPM package:

Good enough for me 👍

We use WriteConsoleW since #156469, so maybe the Windows code path should always return true instead of checking for the WT_SESSION env var?

If you feel confident enough in your testing, sure.

This PR is fine as is though. Worst case it's too cautious and it can be changed later. I suspect the majority of people will be using Windows Terminal by now and if they're not, their first instinct would probably be to see if it solves the issue.

@charles-zablit
Copy link
Contributor Author

If you feel confident enough in your testing, sure.

This PR is fine as is though. Worst case it's too cautious and it can be changed later. I suspect the majority of people will be using Windows Terminal by now and if they're not, their first instinct would probably be to see if it solves the issue.

On Windows 11, I have tried:

  • Windows Terminal
  • Console Host (cmd.exe "old version" and powershell)
  • Git Bash

The arrow character of the error message was displayed correctly each time.

Another concern might be the code page because according to https://stackoverflow.com/a/15828853, the default codepage (437) does not support all Unicode characters. It does support the one we use however.

I will update the doc of the method.

@DavidSpickett
Copy link
Collaborator

My questions/comments were answered, I leave it to @bulbazord to approve or not.

// Host so it can be customized for a specific platform.
llvm::StringRef cursor, underline, vbar, joint, hbar, spacer;
if (stream.AsRawOstream().colors_enabled()) {
if (TerminalSupportsUnicode()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just double-check that, for example, https://lldb.llvm.org/cpp_reference/classlldb_1_1SBCommandReturnObject.html#aa655d205a4ebafb610458fe88068d333 is unaffected by this change? IIRC, the structured error data bypasses this function entirely, but I want to make sure there are no unintended side effects.

charles-zablit added a commit that referenced this pull request Nov 19, 2025
NFC patch which moves `DiagnosticsRendering` from `Utility` to `Host`.

This refactoring is needed for
#168603. It adds a method to
check whether the current terminal supports Unicode or not. This will be
OS dependent and a better fit for `Host`. Since `Utility` cannot depend
on `Host`, `DiagnosticsRendering` must live in `Host` instead.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Nov 19, 2025
NFC patch which moves `DiagnosticsRendering` from `Utility` to `Host`.

This refactoring is needed for
llvm/llvm-project#168603. It adds a method to
check whether the current terminal supports Unicode or not. This will be
OS dependent and a better fit for `Host`. Since `Utility` cannot depend
on `Host`, `DiagnosticsRendering` must live in `Host` instead.
/// On Windows, we always return true since we use the `WriteConsoleW` API
/// internally. Note that the default Windows codepage (437) does not support
/// all Unicode characters. This function does not check the codepage.
bool TerminalSupportsUnicode();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a static in the class, e.g. Terminal::SupportsUnicode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

nekoshirro pushed a commit to nekoshirro/Alchemist-LLVM that referenced this pull request Nov 24, 2025
NFC patch which moves `DiagnosticsRendering` from `Utility` to `Host`.

This refactoring is needed for
llvm/llvm-project#168603. It adds a method to
check whether the current terminal supports Unicode or not. This will be
OS dependent and a better fit for `Host`. Since `Utility` cannot depend
on `Host`, `DiagnosticsRendering` must live in `Host` instead.
Signed-off-by: Hafidz Muzakky <[email protected]>
@charles-zablit charles-zablit force-pushed the charles-zablit/lldb/better-unicode-check branch from 58fc9bb to 19cad71 Compare November 24, 2025 18:02
aadeshps-mcw pushed a commit to aadeshps-mcw/llvm-project that referenced this pull request Nov 26, 2025
NFC patch which moves `DiagnosticsRendering` from `Utility` to `Host`.

This refactoring is needed for
llvm#168603. It adds a method to
check whether the current terminal supports Unicode or not. This will be
OS dependent and a better fit for `Host`. Since `Utility` cannot depend
on `Host`, `DiagnosticsRendering` must live in `Host` instead.
Priyanshu3820 pushed a commit to Priyanshu3820/llvm-project that referenced this pull request Nov 26, 2025
NFC patch which moves `DiagnosticsRendering` from `Utility` to `Host`.

This refactoring is needed for
llvm#168603. It adds a method to
check whether the current terminal supports Unicode or not. This will be
OS dependent and a better fit for `Host`. Since `Utility` cannot depend
on `Host`, `DiagnosticsRendering` must live in `Host` instead.
@charles-zablit
Copy link
Contributor Author

Replying to @JDevlieghere from here:

Will this be correct for Xcode and DAP?

Xcode does not support Unicode in the lldb console. The env variable LANG is not defined in this shell, this patch won't change the behavior in Xcode.

Screenshot 2025-12-02 at 16 46 54

DAP does not seem to use the same code path as lldb for diagnostics rendering either. This patch won't change the behavior either, although VSCode supports Unicode in the debugging console, we could implement support for it.

Screenshot 2025-12-02 at 16 48 14

Copy link
Member

@JDevlieghere JDevlieghere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@charles-zablit charles-zablit merged commit 64e1991 into llvm:main Dec 4, 2025
10 checks passed
charles-zablit added a commit to charles-zablit/llvm-project that referenced this pull request Dec 4, 2025
@charles-zablit
Copy link
Contributor Author

This patch introduces bot failures. Opened a revert here: #170711

charles-zablit added a commit that referenced this pull request Dec 4, 2025
…orts Unicode (#168603) (#170711)

This reverts commit 64e1991.

#168603 breaks the CI because
the bots support Unicode but the tests expect non Unicode characters.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Dec 4, 2025
…rminal supports Unicode (#168603) (#170711)

This reverts commit 64e1991.

llvm/llvm-project#168603 breaks the CI because
the bots support Unicode but the tests expect non Unicode characters.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants